Aim of this report - to investigate trends in the Australian weather data from 2007-2017 and discuss the following research questions:
Main discoveries:
# Loading weather data from local .csv file
weather = read.csv("data/weatherAUS.csv")
# Quick look at top 6 rows of data
kable(head(weather), "html") %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(width = "100%")
| Date | Location | MinTemp | MaxTemp | Rainfall | Evaporation | Sunshine | WindGustDir | WindGustSpeed | WindDir9am | WindDir3pm | WindSpeed9am | WindSpeed3pm | Humidity9am | Humidity3pm | Pressure9am | Pressure3pm | Cloud9am | Cloud3pm | Temp9am | Temp3pm | RainToday | RISK_MM | RainTomorrow |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008-12-01 | Albury | 13.4 | 22.9 | 0.6 | NA | NA | W | 44 | W | WNW | 20 | 24 | 71 | 22 | 1007.7 | 1007.1 | 8 | NA | 16.9 | 21.8 | No | 0.0 | No |
| 2008-12-02 | Albury | 7.4 | 25.1 | 0.0 | NA | NA | WNW | 44 | NNW | WSW | 4 | 22 | 44 | 25 | 1010.6 | 1007.8 | NA | NA | 17.2 | 24.3 | No | 0.0 | No |
| 2008-12-03 | Albury | 12.9 | 25.7 | 0.0 | NA | NA | WSW | 46 | W | WSW | 19 | 26 | 38 | 30 | 1007.6 | 1008.7 | NA | 2 | 21.0 | 23.2 | No | 0.0 | No |
| 2008-12-04 | Albury | 9.2 | 28.0 | 0.0 | NA | NA | NE | 24 | SE | E | 11 | 9 | 45 | 16 | 1017.6 | 1012.8 | NA | NA | 18.1 | 26.5 | No | 1.0 | No |
| 2008-12-05 | Albury | 17.5 | 32.3 | 1.0 | NA | NA | W | 41 | ENE | NW | 7 | 20 | 82 | 33 | 1010.8 | 1006.0 | 7 | 8 | 17.8 | 29.7 | No | 0.2 | No |
| 2008-12-06 | Albury | 14.6 | 29.7 | 0.2 | NA | NA | WNW | 56 | W | W | 19 | 24 | 55 | 23 | 1009.2 | 1005.4 | NA | NA | 20.6 | 28.9 | No | 0.0 | No |
# Size of the data and R's classification of the variables
str(weather)
## 'data.frame': 142193 obs. of 24 variables:
## $ Date : Factor w/ 3436 levels "2007-11-01","2007-11-02",..: 397 398 399 400 401 402 403 404 405 406 ...
## $ Location : Factor w/ 49 levels "Adelaide","Albany",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ MinTemp : num 13.4 7.4 12.9 9.2 17.5 14.6 14.3 7.7 9.7 13.1 ...
## $ MaxTemp : num 22.9 25.1 25.7 28 32.3 29.7 25 26.7 31.9 30.1 ...
## $ Rainfall : num 0.6 0 0 0 1 0.2 0 0 0 1.4 ...
## $ Evaporation : num NA NA NA NA NA NA NA NA NA NA ...
## $ Sunshine : num NA NA NA NA NA NA NA NA NA NA ...
## $ WindGustDir : Factor w/ 16 levels "E","ENE","ESE",..: 14 15 16 5 14 15 14 14 7 14 ...
## $ WindGustSpeed: int 44 44 46 24 41 56 50 35 80 28 ...
## $ WindDir9am : Factor w/ 16 levels "E","ENE","ESE",..: 14 7 14 10 2 14 13 11 10 9 ...
## $ WindDir3pm : Factor w/ 16 levels "E","ENE","ESE",..: 15 16 16 1 8 14 14 14 8 11 ...
## $ WindSpeed9am : int 20 4 19 11 7 19 20 6 7 15 ...
## $ WindSpeed3pm : int 24 22 26 9 20 24 24 17 28 11 ...
## $ Humidity9am : int 71 44 38 45 82 55 49 48 42 58 ...
## $ Humidity3pm : int 22 25 30 16 33 23 19 19 9 27 ...
## $ Pressure9am : num 1008 1011 1008 1018 1011 ...
## $ Pressure3pm : num 1007 1008 1009 1013 1006 ...
## $ Cloud9am : int 8 NA NA NA 7 NA 1 NA NA NA ...
## $ Cloud3pm : int NA NA 2 NA 8 NA NA NA NA NA ...
## $ Temp9am : num 16.9 17.2 21 18.1 17.8 20.6 18.1 16.3 18.3 20.1 ...
## $ Temp3pm : num 21.8 24.3 23.2 26.5 29.7 28.9 24.6 25.5 30.2 28.2 ...
## $ RainToday : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
## $ RISK_MM : num 0 0 0 1 0.2 0 0 0 1.4 0 ...
## $ RainTomorrow : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 2 1 ...
We only disagree with two of the above variables’ classifications.
str(weather$Date)
## Factor w/ 3436 levels "2007-11-01","2007-11-02",..: 397 398 399 400 401 402 403 404 405 406 ...
The ‘Date’ variable should be expressed as a POSIXct (Portable Operating System Interface calendar time) Date object instead of being a factor with over 3000 levels.
Let’s format it as a Date object:
format_date = as.Date(weather$Date)
str(format_date)
## Date[1:142193], format: "2008-12-01" "2008-12-02" "2008-12-03" "2008-12-04" "2008-12-05" ...
As a result, more useful information such as the day of the week and the name of the month can be extracted. Let’s look at on which days and months the first 6 observations occurred:
# Day of the week
head(format(format_date, "%A"))
## [1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
# Abbreviated month
head(format(format_date, "%b"))
## [1] "Dec" "Dec" "Dec" "Dec" "Dec" "Dec"
str(weather$RainToday)
## Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 2 ...
str(weather$RainTomorrow)
## Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 2 1 ...
As seen above, RainToday and RainTomorrow are factors with two levels: “Yes” or “No”. However, this better expressed as a logical type (or Boolean, i.e. TRUE or FALSE).
Let’s format them such that the str() function outputs the following:
# Changing RainToday to a logical type
levels(weather$RainToday)[1] = FALSE
levels(weather$RainToday)[2] = TRUE
logi_rain_today = as.logical(weather$RainToday)
str(logi_rain_today)
## logi [1:142193] FALSE FALSE FALSE FALSE FALSE FALSE ...
# Changing RainTomorrow to a logical type
levels(weather$RainTomorrow)[1] = FALSE
levels(weather$RainTomorrow)[2] = TRUE
logi_rain_tomorrow = as.logical(weather$RainTomorrow)
str(logi_rain_tomorrow)
## logi [1:142193] FALSE FALSE FALSE FALSE FALSE FALSE ...
What is the spread of each variable?
# Looking at the spread of the data
summary(weather)
## Date Location MinTemp MaxTemp
## 2013-03-02: 49 Canberra: 3418 Min. :-8.50 Min. :-4.80
## 2013-03-03: 49 Sydney : 3337 1st Qu.: 7.60 1st Qu.:17.90
## 2013-03-04: 49 Perth : 3193 Median :12.00 Median :22.60
## 2013-03-06: 49 Darwin : 3192 Mean :12.19 Mean :23.23
## 2013-03-07: 49 Hobart : 3188 3rd Qu.:16.80 3rd Qu.:28.20
## 2013-03-10: 49 Brisbane: 3161 Max. :33.90 Max. :48.10
## (Other) :141899 (Other) :122704 NA's :637 NA's :322
## Rainfall Evaporation Sunshine WindGustDir
## Min. : 0.00 Min. : 0.00 Min. : 0.00 W : 9780
## 1st Qu.: 0.00 1st Qu.: 2.60 1st Qu.: 4.90 SE : 9309
## Median : 0.00 Median : 4.80 Median : 8.50 E : 9071
## Mean : 2.35 Mean : 5.47 Mean : 7.62 N : 9033
## 3rd Qu.: 0.80 3rd Qu.: 7.40 3rd Qu.:10.60 SSE : 8993
## Max. :371.00 Max. :145.00 Max. :14.50 (Other):86677
## NA's :1406 NA's :60843 NA's :67816 NA's : 9330
## WindGustSpeed WindDir9am WindDir3pm WindSpeed9am
## Min. : 6.00 N :11393 SE :10663 Min. : 0
## 1st Qu.: 31.00 SE : 9162 W : 9911 1st Qu.: 7
## Median : 39.00 E : 9024 S : 9598 Median : 13
## Mean : 39.98 SSE : 8966 WSW : 9329 Mean : 14
## 3rd Qu.: 48.00 NW : 8552 SW : 9182 3rd Qu.: 19
## Max. :135.00 (Other):85083 (Other):89732 Max. :130
## NA's :9270 NA's :10013 NA's : 3778 NA's :1348
## WindSpeed3pm Humidity9am Humidity3pm Pressure9am
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 980.5
## 1st Qu.:13.00 1st Qu.: 57.00 1st Qu.: 37.00 1st Qu.:1012.9
## Median :19.00 Median : 70.00 Median : 52.00 Median :1017.6
## Mean :18.64 Mean : 68.84 Mean : 51.48 Mean :1017.7
## 3rd Qu.:24.00 3rd Qu.: 83.00 3rd Qu.: 66.00 3rd Qu.:1022.4
## Max. :87.00 Max. :100.00 Max. :100.00 Max. :1041.0
## NA's :2630 NA's :1774 NA's :3610 NA's :14014
## Pressure3pm Cloud9am Cloud3pm Temp9am
## Min. : 977.1 Min. :0.00 Min. :0.0 Min. :-7.20
## 1st Qu.:1010.4 1st Qu.:1.00 1st Qu.:2.0 1st Qu.:12.30
## Median :1015.2 Median :5.00 Median :5.0 Median :16.70
## Mean :1015.3 Mean :4.44 Mean :4.5 Mean :16.99
## 3rd Qu.:1020.0 3rd Qu.:7.00 3rd Qu.:7.0 3rd Qu.:21.60
## Max. :1039.6 Max. :9.00 Max. :9.0 Max. :40.20
## NA's :13981 NA's :53657 NA's :57094 NA's :904
## Temp3pm RainToday RISK_MM RainTomorrow
## Min. :-5.40 FALSE:109332 Min. : 0.000 FALSE:110316
## 1st Qu.:16.60 TRUE : 31455 1st Qu.: 0.000 TRUE : 31877
## Median :21.10 NA's : 1406 Median : 0.000
## Mean :21.69 Mean : 2.361
## 3rd Qu.:26.40 3rd Qu.: 0.800
## Max. :46.70 Max. :371.000
## NA's :2726
How large is the data set?
# Looking at the dimensions of the data
dim(weather)
## [1] 142193 24
It contains 142 193 rows (observations) and 24 columns (variables).
Over what period of time does the data set span?
# Finding the initial and final weather observations
min(as.Date(weather$Date))
## [1] "2007-11-01"
max(as.Date(weather$Date))
## [1] "2017-06-25"
It was collected between November 2007 and June 2017.
Which locations are used? How many are there?
# Finding the names of each location, sorted in alphabetical order
sort(unique(weather$Location))
## [1] Adelaide Albany Albury AliceSprings
## [5] BadgerysCreek Ballarat Bendigo Brisbane
## [9] Cairns Canberra Cobar CoffsHarbour
## [13] Dartmoor Darwin GoldCoast Hobart
## [17] Katherine Launceston Melbourne MelbourneAirport
## [21] Mildura Moree MountGambier MountGinini
## [25] Newcastle Nhil NorahHead NorfolkIsland
## [29] Nuriootpa PearceRAAF Penrith Perth
## [33] PerthAirport Portland Richmond Sale
## [37] SalmonGums Sydney SydneyAirport Townsville
## [41] Tuggeranong Uluru WaggaWagga Walpole
## [45] Watsonia Williamtown Witchcliffe Wollongong
## [49] Woomera
## 49 Levels: Adelaide Albany Albury AliceSprings BadgerysCreek ... Woomera
From above, 49 locations in Australia are used spanning from Adelaide to Woomera.
The data was obtained from kaggle but it originates from the Australian Government Bureau of Meteorology’s website. It is a combination of two separate data sets on daily weather records and climate data.
Each row represents a new weather observation while each column represents the properties of the weather observations.
The data was combined from two separate data sets; one recording daily observations and the other, climate data. Without knowing how the two were combined, the data’s validity comes into question.
In spite of this, the data’s origins in the Australian Government do suggest a high degree of validity.
Other possible issues include gaps in the table where a valid observation is not available due to confounding factors (such as a failure in observing equipment). These gaps are populated as NA’s, reducing the diversity of the data set.
Possible stakeholders include:
Weather describes a combination of certain meteorological factors such as rainfall, temperature, humidity, wind speed, wind direction. While weather defines a short period of time, climate is used to describe the long term patterns in weather conditions for a certain region.
Climate and weather data is incredibly important and impactful on a wide range of industries including agriculture, tourism, and renewable energy. By observing climate data over periods of time, we can analyse trends and predict future climate behaviour. We can then, subsequently, apply this research to specific industries in order to optimise output efficiency.
What makes a location good for agricultural production?
# Creating a data set that summarises each location by its chance of receiving rainfall
percent_rain_data = weather %>%
group_by(Location) %>%
summarise(percent_rain = mean(RainToday == "TRUE", na.rm = TRUE)*100)
# Bar plot to show the chance of a rainy day in each location
chance_rain = plot_ly(percent_rain_data, x = ~reorder(Location,-percent_rain), y = ~percent_rain, type = "bar", color = I("rgba(0,128,128,0.9)")) %>%
layout(title = "Chance of a Rainy Day Across Australia", xaxis = list(title="Location"), yaxis = list(title="Percentage (%)"))
chance_rain
From the bar plot above, the top five most consistent locations for rainfall in Australia appear to be:
However, we should look at the amount of rainfall received by these locations on rainy days. This is represented below:
# Creating a data set that summarises each location by its average rainfall on rainy days
mean_rain_data = weather %>%
group_by(Location) %>%
filter(Rainfall > 0) %>%
summarise(mean_rainfall = mean(Rainfall))
# Bar plot to show the average rainfall in each location on rainy days
mean_rain = plot_ly(mean_rain_data, x = ~reorder(Location, -mean_rainfall), y = ~mean_rainfall, type ="bar", color = I("rgba(0,128,128,0.9)")) %>%
layout(title = "Average Rainfall on Rainy Days Across Australia", xaxis = list(title = "Location"), yaxis = list(title = "Mean Rainfall (mm)"))
mean_rain
# Creating a data set that summaries each location by its median rainfall on rainy days
median_rain_data = weather %>%
group_by(Location) %>%
filter(Rainfall > 0) %>%
summarise(median_rainfall = median(Rainfall))
# Bar plot to show the median rainfall in each location on rainy days
median_rain = plot_ly(median_rain_data, x = ~reorder(Location, -median_rainfall), y = ~median_rainfall, type = "bar", color = I('rgba(0,128,128,0.9)')) %>%
layout(title = "Median Rainfall on Rainy Days Across Australia", xaxis = list(title = "Location"), yaxis = list(title = "Median Rainfall (mm)"))
median_rain
According to the two bar plots above, three locations consistently appear in the top five locations for both their mean and median rainfall. These are:
However, it appears that although Katherine receives high rainfall on rainy days, it only rains on average for 17% of days each year. Thus, it is highly inconsistent and cannot be regarded as an ideal location for farming.
On the other hand Darwin rains on average 27% of days each year with a median rainfall of 7.4mm, making it very a consistent location for high rainfall. Furthermore, Cairns appears from out previous bar plot, ranking as the 3rd most consistent location for rainfall at 32% of days each year on average.
Yet, an optimal location for agricultural production also requires plenty of sunshine:
# Box plot to show the number of hours of bright sunshine by location
sunshine = plot_ly(weather, x = ~Sunshine, y = ~Location, type = "box", color = ~Location, marker = list(size = 5, opacity = 0.2)) %>%
layout(title = 'Sunshine Across Australia From 2007-2017', yaxis = list(title = 'Locations', autorange = TRUE, categoryorder = "category descending", title = "Locations"), xaxis = list(title = 'Sunshine (Hours per Day)'))
sunshine
From the box plot representing hours of sunshine across Australia, notice that Darwin has a relatively high median hours of sunshine a day at 10 hours, and a reasonably low IQR (Interquartile Range) of 4 hours, indicating a consistent large number of hours of sunshine per day. Furthermore, Cairns also has a high median number of hours of sunshine per day at 8.6 hours, alongside an adequate IQR of 5.5 hours.
This would further suggest that Darwin and Cairns would be very optimal locations for agriculture in Australia, and indeed perhaps the most favourable.
However, other factors must also be considered. One such factor would be the built-up nature of Darwin’s topography and the subsequent lack of available free land for agriculture. Another factor is the possibility of flash flooding, evident in Darwin with one day in 2011 receiving 367.6mm of rain, the 2nd highest amount of rainfall in one day in Australia over the last 10 years.
Similarly, Cairns must also be assessed more deeply. Like most of North Queensland, Cairns is prone to tropical cyclones which, again, would heavily influence the decision of whether or not to implement agricultural endeavours in the region.
To best address this question, we shall analyse the data obtained from Sydney, Alice Springs and Darwin since they represent significantly different geographical locations across Australia: temperate sub-tropical, hot desert and tropical savanna climates respectively.
# Creating a data set that splits the date column into year, month and day
date_split_data = weather %>%
tidyr::separate(col = Date,
into = c("year", "month", "day"),
sep = "-")
# Adding a column for the average temperature
date_split_data = mutate(date_split_data, mean_temp = (MaxTemp+MinTemp)/2)
# Adding a column for the day of the week
day_name = format(as.Date(weather$Date), "%A")
date_split_data = cbind(date_split_data, day_name)
# Added a column for the month name as a factor
month_name = format(as.Date(weather$Date), "%B")
date_split_data = cbind(date_split_data, month_name)
# Reordering the columns
date_split_data = date_split_data[, c(1, 2, 29, 3, 28, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)]
# Ordering the month coloumn
date_split_data$month_name = factor(date_split_data$month_name, c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"))
# Creating a data set that summarises each location by its average monthly temperature
mean_temp_month_data = date_split_data %>%
group_by(Location, year, month_name) %>%
summarise(mean_temp_month = mean(mean_temp, na.rm = "True"))
# Line chart to show fluctuations in the average monthly temperature each year for Sydney
mean_temp_sydney = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
filter(Location == "Sydney") %>%
add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#edf8b1", "#7fcdbb", "#2c7fb8")) %>%
layout(title = "Average Temperature in Sydney Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_sydney
Sydney is a temperate sub-tropical climate.
According to the line chart above these are characterised by a gradual change in temperature throughout the year from 13-25°C rather than extreme seasonal differences. This temperature moderation is likely due to Sydney’s proximity to the ocean. In fact, over time this trend appears highly consistent with minor fluctuations between years.
Indeed a temperate sub-tropical climate is known to exhibit a gradual shift between mild winters and warm summers, with the shape of the annual temperature graph indicating four distinct seasons.
# Line chart to show fluctuations in the average monthly temperature each year for Alice Springs
mean_temp_alicesprings = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
filter(Location == "AliceSprings") %>%
add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#ffeda0", "#feb24c", "#f03b20")) %>%
layout(title = "Average Temperature in Alice Springs Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_alicesprings
Alice Springs is a hot desert climate
From the line chart above, this climate appears to be identified by high average temperatures in summer, and low temperatures in winter. Indeed a greater spread in temperature values from 9-30°C reflects this characteristic.
In fact, a hot desert climate portrays such a temperature-time graph, portraying a four-season trend with significant seasonal differences and a hence a steeper curve.
# Line chart to show fluctuations in the average monthly temperature each year for Darwin
mean_temp_darwin = plot_ly(mean_temp_month_data, x = ~month_name, y = ~mean_temp_month, color = ~year, text = ~Location, hoverinfo = "text") %>%
filter(Location == "Darwin") %>%
add_trace(type = "scatter", mode = "lines", line = list(shape = "spline"), colors = c("#e7e1ef", "#c994c7", "#dd1c77")) %>%
layout(title = "Average Temperature in Darwin Per Month", xaxis = list(title = "Month", range = c(0,11)), yaxis = list(title = "Mean Temperature (°C)", range = c(0,35)))
mean_temp_darwin
Darwin possesses a tropical savanna climate.
According to the line chart above, this type of climate appears to have much less distinct seasons. This is evident in the very high average temperature that persists throughout the year for Darwin with little variation, only ranging from 23-30°C.
In fact, instead of having four distinct seasons, a tropical savanna climate has distinct wet and dry seasons.
This characteristic is evident below:
# Changing RainToday to logical type
levels(date_split_data$RainToday)[1] = FALSE
levels(date_split_data$RainToday)[2] = TRUE
date_split_data = date_split_data %>%
mutate(RainToday, RainToday = as.logical(RainToday))
# Changing RainTomorrow to logical type
levels(weather$RainTomorrow)[1] = FALSE
date_split_data = date_split_data %>%
mutate(RainTomorrow, RainTomorrow = as.logical(RainTomorrow))
# Creating a data set
daily_summary = date_split_data %>%
group_by(Location, year, month_name) %>%
summarise(mean_daily_rain = mean(Rainfall, na.rm=TRUE),
median_daily_rain = median(Rainfall, na.rm = TRUE),
total_monthly_rain = sum(Rainfall, na.rm = TRUE),
max_daily_rain = max(Rainfall, na.rm=TRUE),
min_daily_rain = min(Rainfall, na.rm=TRUE),
mean_max_temp = mean(MaxTemp, na.rm=TRUE),
mean_min_temp = mean(MinTemp, na.rm=TRUE),
median_max_temp = median(MaxTemp, na.rm=TRUE),
median_min_temp = median(MinTemp, na.rm=TRUE),
max_daily_temp = max(MaxTemp, na.rm=TRUE),
min_daily_temp = min(MinTemp, na.rm=TRUE),
median_con_rain = median(rle(RainToday)$lengths[rle(RainToday)$values==TRUE], na.rm=TRUE))
# Bar plot of the mean rainfall for each month in Darwin
darwin_weather = subset(daily_summary, Location == "Darwin")
ggplot(darwin_weather, aes(x = month_name, y = mean_daily_rain, fill = year)) + geom_bar(stat = "identity") + ggtitle("Average Rainfall per Month in Darwin") + xlab("Month") + ylab("Rainfall (mm)") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Creating a data set that adds a column for the average wind speed in each observation
mean_wind_speed_data = weather %>%
mutate(mean_wind_speed = (WindSpeed3pm + WindSpeed9am)/2)
# Box plot to show the average wind speed by location
wind_speed = plot_ly(mean_wind_speed_data, y = ~mean_wind_speed, x = ~Location, type = "box", color = ~Location, marker = list(size = 3, opacity = 0.9)) %>%
layout(title = "Average Wind Speed Across Australia From 2007-2017", yaxis = list(title = "Wind Speed (km/hr)"), xaxis = list(title = "Locations"))
wind_speed
Despite the consistency of wind speed across locations, based on the comparative box plot above, through considering the median and IQR it appears that the most ideal locations for generating wind powered energy would be:
However, for Sydney Airport it is in practicality unrealistic to develop a system of wind turbines and the like in such a congested location. Hence we shall disregard these locations.
Similarly, we can also disregard Mount Gamier and Norfolk Island due to the technical challenge of setting up a wind turbine system on a tall mountain and a small island respectively.
Melbourne, Woomera and Darwin still seem appealing locations, yet Woomera would indeed have a lot more free land to construct wind turbines on.
Now let’s consider sunshine as well:
# Creating a data set that summarises each location by its median sunshine and wind speed
median_sunshine_and_wind_speed_data = mean_wind_speed_data %>%
group_by(Location) %>%
summarise(median_sunshine = median(Sunshine, na.rm = "True"), median_wind_speed = median(mean_wind_speed, na.rm = "True"))
# Scatter plot to show the number of median hours of bright sunshine against median wind gust speed
sunshine_wind_speed = plot_ly(median_sunshine_and_wind_speed_data, x =~median_wind_speed, y = ~median_sunshine, type = "scatter", mode = "markers", color = ~Location) %>%
layout(title = 'Median Hours of Sunshine VS Median Wind Speed', yaxis = list(title = 'Median Hours of Sunshine per Day', autorange = TRUE, categoryorder = "category descending", title = "Locations"), xaxis = list(title = 'Median Wind Speed (km/h)'))
sunshine_wind_speed
According to the scatter plot above, Woomera is the most ideal location for generating renewable energy with 10 median hours of sunshine per day and a median wind speed of 19.5 km/h.
In fact, despite having a median wind speed of 18.5 km/h, Melbourne appears to have only 6.7 median hours of sunshine per day, revealing its lacking potential for reaping solar energy from the Sun.
Yet, Darwin also seems quite ideal, with a median wind speed of 17.5 km/h and 10 median hours of sunshine per day. In fact, it has a much smaller IQR of 6.5 km/h in comparison to 10.5 km/h for Woomera.
Thus, considering all the factors, it appears that Woomera and Darwin are the most ideal locations for generating renewable energy in Australia.
sessionInfo()
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] psych_1.8.12 caret_6.0-81 lattice_0.20-38 plotly_4.8.0
## [5] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.0.1 purrr_0.3.1
## [9] readr_1.3.1 tidyr_0.8.3 tibble_2.0.1 ggplot2_3.1.0
## [13] tidyverse_1.2.1 kableExtra_1.0.1 magrittr_1.5 knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.0 jsonlite_1.6 viridisLite_0.3.0
## [4] splines_3.5.2 foreach_1.4.4 prodlim_2018.04.18
## [7] modelr_0.1.4 shiny_1.2.0 assertthat_0.2.0
## [10] highr_0.7 stats4_3.5.2 cellranger_1.1.0
## [13] yaml_2.2.0 ipred_0.9-8 pillar_1.3.1
## [16] backports_1.1.3 glue_1.3.1 digest_0.6.18
## [19] RColorBrewer_1.1-2 promises_1.0.1 rvest_0.3.2
## [22] colorspace_1.4-0 recipes_0.1.4 httpuv_1.4.5.1
## [25] htmltools_0.3.6 Matrix_1.2-15 plyr_1.8.4
## [28] timeDate_3043.102 pkgconfig_2.0.2 broom_0.5.1
## [31] haven_2.1.0 xtable_1.8-3 scales_1.0.0
## [34] webshot_0.5.1 later_0.8.0 gower_0.2.0
## [37] lava_1.6.5 generics_0.0.2 withr_2.1.2
## [40] nnet_7.3-12 lazyeval_0.2.1 cli_1.0.1
## [43] mnormt_1.5-5 mime_0.6 survival_2.43-3
## [46] crayon_1.3.4 readxl_1.3.1 evaluate_0.13
## [49] nlme_3.1-137 MASS_7.3-51.1 foreign_0.8-71
## [52] xml2_1.2.0 class_7.3-14 tools_3.5.2
## [55] data.table_1.12.0 hms_0.4.2 munsell_0.5.0
## [58] compiler_3.5.2 rlang_0.3.1 grid_3.5.2
## [61] iterators_1.0.10 rstudioapi_0.9.0 htmlwidgets_1.3
## [64] crosstalk_1.0.0 labeling_0.3 rmarkdown_1.11
## [67] gtable_0.2.0 ModelMetrics_1.2.2 codetools_0.2-15
## [70] reshape2_1.4.3 R6_2.4.0 lubridate_1.7.4
## [73] stringi_1.4.3 parallel_3.5.2 Rcpp_1.0.0
## [76] rpart_4.1-13 tidyselect_0.2.5 xfun_0.5
Kaggle.com. (2019). Rain in Australia. [online] Available at: https://www.kaggle.com/jsphyg/weather-dataset-rattle-package [Accessed 13 Mar. 2019].
Bom.gov.au. (2019). Climate Data Online. [online] Available at: http://www.bom.gov.au/climate/data/?fbclid=IwAR2CLU4ge5DcxbXfRBPA0hshBijbCXu6oir2B7hNZAL5WMSY0SGlIeXzklI [Accessed 13 Mar. 2019].
Weatheronline.co.uk. (2019). Climate of the World: Australia | weatheronline.co.uk. [online] Available at: https://www.weatheronline.co.uk/reports/climate/Australia.htm [Accessed 13 Mar. 2019].
Colorbrewer2.org. (2019). ColorBrewer: Color Advice for Maps. [online] Available at: http://colorbrewer2.org/?fbclid=IwAR1v0BXFZsss_fEZ0TaI74MOarltAPJZWz-KivgKQp7CiGaUeQc7J-piFkE#type=qualitative&scheme=Set1&n=3 [Accessed 20 Mar. 2019].
19january2017snapshot.epa.gov. (2019). Climate Impacts on Agriculture and Food Supply | Climate Change Impacts | US EPA. [online] Available at: https://19january2017snapshot.epa.gov/climate-impacts/climate-impacts-agriculture-and-food-supply_.html [Accessed 16 Mar. 2019].